HADOOP-19494: [ABFS] Fix Case Sensitivity Issue for hdi_isfolder metadata#7496
HADOOP-19494: [ABFS] Fix Case Sensitivity Issue for hdi_isfolder metadata#7496anujmodi2021 merged 4 commits intoapache:trunkfrom
Conversation
|
🎊 +1 overall
This message was automatically generated. |
============================================================
|
| String resourceType = result.getResponseHeader(dirHeaderName); | ||
| return resourceType != null && resourceType.equals(TRUE); |
There was a problem hiding this comment.
Since getResponseHeader method returns either empty string or true- should we check for resourceType != EMPTY_STRING instead of null?
There was a problem hiding this comment.
In the case of an empty string, resourceType.equals(TRUE) will return false, so there’s no need for an explicit check for an empty string.
The null case occurs when the header we are looking for does not exist, which is why this check is needed.
| } | ||
|
|
||
| String resourceType = result.getResponseHeader(dirHeaderName); | ||
| return resourceType != null && resourceType.equals(TRUE); |
There was a problem hiding this comment.
Yes since we already checked for null in the getHeaderNameIgnoreCase method, this repeated check may not be needed
There was a problem hiding this comment.
Since it was an existing line of code, I didn’t change it. However, it does make sense to remove this null check.
| @Override | ||
| public Map<String, List<String>> getResponseHeaders() { | ||
| return new HashMap<>(); | ||
| return Collections.singletonMap(X_MS_META_HDI_ISFOLDER, |
There was a problem hiding this comment.
Why are we adding this entry in the map for all getResponse header calls ?
There was a problem hiding this comment.
AbfsHttpOperationWithFixedResultForGetFileStatus is used in one place where we are checking for an implicit directory. In the case of an implicit directory, I am applying the same logic as in the method httpHeader.equalsIgnoreCase(X_MS_META_HDI_ISFOLDER), where we return true if the header is X_MS_META_HDI_ISFOLDER.
| * Test mkdirs with HDI folder configuration, | ||
| * verifying the correct header and directory state. | ||
| */ | ||
| @Test |
There was a problem hiding this comment.
Add test for other variations such as ALL Caps, only one later caps, all small. Also one test can be when we set multiple metadata keys, the directory one is correctly identified.
There was a problem hiding this comment.
Sure, will add the test cases.
anujmodi2021
left a comment
There was a problem hiding this comment.
Added some thoughts.
| @Override | ||
| public boolean checkIsDir(AbfsHttpOperation result) { | ||
| String resourceType = result.getResponseHeader(X_MS_META_HDI_ISFOLDER); | ||
| String dirHeaderName = getHeaderNameIgnoreCase(result, X_MS_META_HDI_ISFOLDER); |
There was a problem hiding this comment.
Won't it be better to directly get the value from this function? Something like:
getCaseInsensitiveResponseHeader() similar to getResponseHeader
And this can be moved to AbfsHttpOperation class itself where normal version of this method already exists.
There was a problem hiding this comment.
It can be later used by other classes and other headers as well not only BlobClient.
| intercept(FileAlreadyExistsException.class, () -> fs.mkdirs(new Path("/testFilePath"))); | ||
| intercept(FileAlreadyExistsException.class, () -> fs.mkdirs(new Path("/testFilePath/newDir"))); | ||
| } | ||
|
|
There was a problem hiding this comment.
Do we need to add tests for this change in mkdir test file?
To me it looks like more of a metadata change.
We should add tests in ITestFileStatus, ITestListStatus, ITestAttributes classes.
All of them have dependency on this header to return right response.
Listing output also depends on this header is that also case-insensitive? Can we add some tests for that.
There was a problem hiding this comment.
The reason for adding these test cases here is we are performing mkdir calls with different cases of Hdi_isfolder. Therefore, the focus is on how mkdir behaves in these scenarios.
But yes, it makes sense it is a metadata change so we can move it to other class if that seems more appropriate to you.
anujmodi2021
left a comment
There was a problem hiding this comment.
Thanks for taking suggestions.
LGTM.
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
============================================================
|
|
|
||
| /**{@inheritDoc}*/ | ||
| @Override | ||
| public String getResponseHeaderIgnoreCase(final String headerName) { |
There was a problem hiding this comment.
If both the children are using the same method implementation, same can be added in parent itself if it won't change in future for other children if added
There was a problem hiding this comment.
There are three more implementations of AbfsHttpOperation where the implementation is different. Thats the reason I have implemented it separately for all.
…isfolder metadata (apache#7496) Contributed by Manish Bhatt Reviewed by Anmol, Manika, Anuj Signed off by: Anuj Modi<[email protected]>
…isfolder metadata (apache#7496) Contributed by Manish Bhatt Reviewed by Anmol, Manika, Anuj Signed off by: Anuj Modi<[email protected]>
…isfolder metadata (#7496) (#7584) Contributed by Manish Bhatt Reviewed by Anmol, Manika, Anuj Signed off by: Anuj Modi<[email protected]>
Jira: https://issues.apache.org/jira/browse/HADOOP-19494
In the blob endpoint, we determine whether the path is a file, or a directory based on the metadata attribute hdi_isfolder. When creating a directory, we set hdi_isfolder to true. Currently, our method for checking if the path is a directory involves a case-sensitive equality check. Consequently, if someone configures a directory with Hdi_isfolder, the driver will not recognize that path as a directory. We need to address this issue because, in the backend, hdi_isfolder and Hdi_isfolder are considered the same metadata attribute. Therefore, the solution involves modifying our equality check to be case-insensitive, ensuring that the driver correctly identifies directories regardless of case variations in the metadata attribute.